Latin Hypercubes: A Class of Multidimensional Declustering Techniques
نویسندگان
چکیده
The I/O subsystem is widely accepted as one of the principal bottlenecks for high performance parallel databases systems. The emergence of parallel I/O architectures has made the problem of data declustering, i.e. fragmenting a le of records and allocating the pieces to different disks, one of prime importance. This is evident from the growing activity in this area. In this study we focus only on multi-attribute declustering methods which are based on some type of grid-based partitioning of the data space. Since the multidimensional range query is the main workhorse for applications accessing such data, the focus is to provide eecient support for it. We rst show that there exists no declustering method that is strictly optimal for range queries if the number of disks is greater than 5. Thus the focus is on using declustering methods which provides good average case performance and are also optimal for a large class of queries. A class of multidimensional declustering methods, called Latin Hypercubes, is proposed. Conditions under which this class is optimal are derived. Also provided are the worst case and average case bounds on multidimensional range query performance. A detailed experimental evaluation is carried out to see how the class compares with other declustering methods. Parameters that are varied are shape and size of queries, database size, number of attributes and the number of disks. Our ndings (theoretical and experimental) show that latin hypercubes do very well for large queries (near optimal), and partial match queries, and are within reasonable bounds of other declustering methods for small queries. Since it is not possible to have a declustering method which performs optimally for all possible range queries, our ndings help decide when to use this class of methods. Finally, since there is no clear winner, parallel database systems must support a number of declustering methods and Latin Hypercubes would invariably have to be one of them.
منابع مشابه
A General Construction for Space-filling Latin Hypercubes
Abstract: We propose a general method for constructing Latin hypercubes of flexible run sizes for computer experiments. The method makes use of arrays with a special structure and Latin hypercubes. By using different such arrays and Latin hypercubes, the proposed method produces various types of Latin hypercubes including orthogonal and nearly orthogonal Latin hypercubes, sliced Latin hypercube...
متن کاملStudy of Scalable Declustering Algorithms for Parallel Grid Files
Efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known acce...
متن کاملEfficient retrieval of multidimensional datasets through parallel I/O
Many scientific and engineering applications process large multidimensional datasets. An important access pattern for these applications is the retrieval of data corresponding to ranges of values in multiple dimensions. Performance is limited by disks largely due to high disk latencies. Tiling and distributing the data across multiple disks is an effective technique for improving performance th...
متن کاملLatin k-hypercubes
We study k dimensional Latin hypercubes of order n. We describe the automorphism groups of the hypercubes and define the parity of a hypercube and relate the parity with the determinant of a permutation hypercube. We determine the parity in the orbits of the automorphism group. Based on this definition of parity we make a conjecture similar to the Alon-Tarsi conjecture. We define an orthogonali...
متن کاملConcentric Hyperspaces and Disk Allocation for Fast Parallel Range Searching
Data partitioning and declustering have been extensively used in the past to parallelize I/O for range queries. Numerous declustering and disk allocation techniques have been proposed in the literature. However, most of these techniques were primarily designed for two-dimensional data and for balanced partitioning of the data space. As databases increasingly integrate multimedia information in ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994